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Abstract 

The mediator/wrapper approach is used to integrate 
data from different databases and other data sources by 
introducing a middleware virtual database that provides 
high level abstractions of the integrated data. A frame- 
work is presented for querying XML data through such an 
Object-Oriented (OO) mediator system using an 00 
query language. The mediator architecture provides the 
possibility to specify 00 queries and views over combi- 
nations of data from XML documents, relational data- 
bases, and other data sources. In this way interoperability 
of XML documents and other data sources is provided. 
The mediator provides OO views of the XML data by 
inferring the schema of imported XML data from the DTD 
of the XML documents, if available, using a set of 
translation rules. A strategy is used, for minimizing the 
number of types (classes) generated in order to simplify 
the querying. If XML documents not having DTDs are 
read, or if the DTD is incomplete, the system incremen- 
tally infers the 00 schema from the XML structure while 
reading XML data. This requires that the mediator 
database is capable of dynamically extending and modi- 
fying the 00 schema. The paper overviews the architec- 
ture of the system and describes incremental rules for 
translating XML documents to 00 database structures. 



1 Introduction 

The capability of database storage and processing is 
central in most information systems. Earlier, organiza- 
tions used monolithic database management systems. 
However, nowadays there are often many isolated data 
repositories distributed over personal computers and 
networks of computers. Those data repositories are often 
heterogeneous because of the differences in the semantics 
of data and DBMS differences such as different data 
models and query languages. For those heterogeneous 
databases, there emerge needs to incorporate and provide 
the user with a unified information view. 

The wrapper-mediator approach [18] has been pro- 
posed to help to solve the data integration problem. In the 
wrapper/mediator architecture, a mediator, which is an 



intermediate virtual database, is established between the 
data sources and the application using them. A wrapper is 
an interface to a data source that translates data into a 
common data model (CDM) used by the mediator. The 
user accesses the data sources through one or several 
mediator systems that present high-level abstractions 
(views) of combinations of source data. The user does not 
know where the data comes from but is able to retrieve 
and update the data by using a common mediator query 
language. 

In this project, we investigate a method to combine and 
query XML documents (files and data streams) from the 
web through an object-oriented database mediator system. 
We have implemented a system that translates XML data 
to an 00 common data model that can be queried using 
an 00 query language similar to the object extensions of 
SQL-99. When available, the DTD meta-data descriptions 
of the XML documents are used to infer the OO schema. 
If there is no DTD specified the system will incrementally 
infer the OO schema while reading the XML documents 
using a set of translation rules. A straightforward transla- 
tion, as is done in [3] for an extended OQL, will generate 
many types (classes) and make the queries clumsy, with 
many levels of function calls. The system therefore has a 
strategy to translate XML elements to functions (attrib- 
utes or methods) rather than types when possible. It may 
then happen that the system first defines an element type 
as a function and then later discovers that the function 
must be migrated to a type to represent all uses of the 
element. Therefore the schema is dynamically modified 
during XML data reading when the object structure first 
inferred may not be general enough to represent the XML 
data read later. For this reason, incremental rules are 
defined that modify the schema dynamically when read- 
ing XML documents without DTDs, or when the DTD is 
not completely describing the XML data. 

The dynamic creation of the OO schema furthermore 
has the effect of helping the user understand the database 
by looking at a generated OO schema discovered by the 
system. 

The rest of this report is arranged as follows: Section 2 
presents XML and the background to understand this 
paper. Section 3 describes the architecture of our object- 
oriented mediator system with regard to XML data 
sources and relates it to similar approaches. Section 4 
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contains the translation rules used to parse XML docu- 
ments. Section 5 concludes the work and discusses 
possible future work. 

2 Background 

XML, Extensible Markup Language (5) was created as 
a data exchange and representation standard. XML 
provides ways to store complex data structures in a way 
suitable for exchange over the Internet. An XML docu- 
ment can be a file or a data stream containing nested 
elements starting with a root element. It may have meta- 
data descriptions through DTDs (Document Type Defini- 
tions). These meta-data descriptions provide some struc- 
ture and constraints on the XML documents using the 
DTD. However XML documents may be described 
without DTDs and DTDs may also leave parts of the data 
unspecified; i.e. compared to the relational and OO data 
models, the XML data model is semi-structured. 

Compared to an OO data model the XML data model 
docs not have classes, methods, and inheritance; instead it 
has element types and attributes [5] which are similar to 
classes and attributes in OO data models. Thus XML does 
not use a complete OO data model. In order to avoid 
confusion in the discussion below we use the term ele- 
ment tag (or just tag) to mean element type. • 

Example 1 shows a small DTD named person.dtd. In 
this example, the DTD restricts the tag person to always 
contain a sub-element tagged employee, and each element 
tagged person always has the attribute id. We call an 
{ELEMENT statement element definition. Elements 
lagged employee always have two sub-elements tagged 
family and given both of which have the system attribute 
"#PCDATA'\ "UPCDATA" indicates that the element 
may have a text string as its value. The element defini- 
tions specify containment relationships to sub-elements 
along with constraints on the order and occurrences of 
sub-elements. In the example the sub-element tagged 
given always must follow family inside an element tagged 
employee. The element tagged email can have any other 
element as sub-element (indicated with ANY)\ we say that 
the sub-elements of email are unspecified. 

<! ELEMENT person (employee)> 

< fELEM ENT empl oyee (fam ily, given ) > 
<! ELEMENT family (#PCDATA)> 

<f ELEMENT given (#PCDATA)> 

< f ELEMENT email ANY> 
<!ATTUST person id ID #REQUIRED> 

Example 1 The DTD person.dtd 

<!DOCTYPE person SYSTEM "person.dtd"> 
<person id = "669"> 
<employee> 

<family> Lin <Jfamily> 



<given> Hui <Jgiven> 
<Jemployee> 
</person> 

Example 2 An XML document 

Example 2 is an XML document using the DTD per- 
son.dtd. The element tagged person has an attribute id 
whose value is "669" and a subelement tagged employee 
that contains two subelements: the element tagged family 
whose value is "Lin" and the element tagged given whose 
value is "Hui". Every XML document must have a root 
element specified in the header; in this case it is the ele- 
ment tagged person. 

Several query languages have been proposed for XML 
documents [15J. A graphical query language is introduced 
in [2] where queries over XML documents are specified 
graphically. A pattern-based query language is proposed 
in [20] where regular path expressions are used to match 
XML structure and data, and derive new XML data. 

Lore [9] is a database management system for storing 
and querying XML documents. Lore maps a XML docu- 
ment into a semi:Structured directed, labeled, and ordered 
graph data model called OEM [14]. In [8][17] methods 
are developed to store XML data in relational databases. 

3 Object-oriented data mediation over XML 
data sources 

The purpose of our work is to transparently query 
XML documents from an OO mediator. By wrapping an 
XML data source the user does not need to know from 
where the XML document originates and will be able to 
combine many XML data sources and query them by 
using an OO query language. 

In our approach we create an OO schema by translat- 
ing the DTD according to some translation rules. When 
no DTD is present, or when the DTD is incomplete, the 
OO schema is incrementally created while the XML 
document is read. Thus a structured OO schema is incre- 
mentally discovered from DTDs or from XML data. Such 
a schema provides semantically enriched meta-data to 
guide the user when querying the database. It also pro- 
vides a basis for data indexing and efficient query proc- 
essing [1J. 

The XML data is translated into objects when read into 
the OO mediator database. In order to query such data 
with an OO query language the following facilities are 
needed: 

• an OO storage manager to represent object OlDs; 

• an OO query processing system; 

• a convention for what constitutes an OO database 
schema from a set of XML documents; 

• a translation mechanism from XML data to objects in 
the mediator. 



40 



We use an OO data model to which XML data is 
translated similar to what is proposed for SGML in 13 j. in 
[12] a strategy is devised to generate a graph called the 
Data Guide that summarizes containment relationships 
among XML elements using the OEM graph data model. 
In our case we generate the containment relationships as 
both type (class) and function (attribute) definitions. 
Unlike [3] we use a strategy to avoid creating types in 
order to simplify the schema and subsequent querying. 

In our system all XML documents having the same 
DTD are regarded as one data source having an OO 
schema inferred from the DTD. The schema of the me- 
diator combines the imported schemas, and the mediator 
database view covers the union of the accessed XML 
documents. We do not require every XML document to 
have a DTD as in [3] but incrementally generate and 
modify the schema while reading XML documents when 
no DTD is available or the DTD is incomplete. Our 
system can thus handle XML documents with DTDs, with 
incomplete DTDs, or without DTDs. 

When DTDs are translated to OO schemas, some se- 
mantic enrichment is made to infer types and attributes 
from the DTDs in order to simplify the schema according 
to the rules below. Whenever a new XML document 
having a DTD is accessed from the mediator we check 
what DTD it refers to, if any, and whether this DTD 
previously has been translated to an OO schema in the 
mediator. If the DTD is not translated beforehand the 
mediator will read the DTD to infer types (classes) and 
functions (attributes). XML documents having no DTD at 
all are regarded as belonging to a special schema. 

We have implemented our approach using the object- 
oriented, lightweight, and extensible database mediator 
system AMOS II [16]. It provides a main memory data 
manager to store materialized XML data, and a query 
processing engine. AMOS II has an object-oriented query 
language, AMOSQL, similar to OSQL [7] and the OO 
extensions of SQL-99. 

In addition to basic OO data management facilities 
AMOS II provides facilities for data integration by 



Application 



Mediator 



Wrapper Wrapper 




combining data from distributed and heterogeneous data 
sources using the mediator/wrapper approach. By utiliz- 
ing data mediation facilities of AMOS II [6][10][1 1] we 
can query combinations of XML data, relational databases 
and other kinds of data sources, as illustrated in Figure 1 . 
It shows an example of distributed mediation with AMOS 
II where three applications access data from four hetero- 
geneous data sources through three distributed mediators. 

The AMOS II data model contains three basic con- 
structs: objects, types and functions. Objects are used to 
model all entities in the database. Types (i.e. classes) are 
used to describe the object structure and they are organ- 
ized in an OO type hierarchy of subtypes and supertypes. 
Functions arc defined on types and are used to represent 
properties of objects and relationships between objects. 
Functions thus represent attributes and methods. The 
AMOS II system is extensible through several interfaces 
to its kernel - C, Java, and Lisp, respectively. In this 
work, we utilize the Java interface to build an XML 
wrapper for AMOS II. Our implementation materializes 
read data in the AMOS II storage manager and we have 
developed a translation mechanism from XML to the 
AMOS II OO data model. 

The translation mechanism includes: 

• A strategy for generating schemas from DTDs when 
available. The OO schema is derived from the DTD in 
this case and it describes the contents of one or several 
XML documents referencing the same DTD. 

• A strategy for incrementally populating the database 
using the generated OO schemas while reading XML 
data. Thus OO database update statements are called 
while XML data is read. 

• Strategies for dynamically extending the schema when 
reading XML data with no DTD or when the DTDs are 
not fully describing all the data. In this case both 
database update and schema modification statements 
are dynamically called while XML data is read. 

The AmosXML architecture in Figure 2 is based on 
the existing system in a non-intrusive way, requiring no 
modifications to AMOS II. It uses a Java interface to 
interact with the kernel of the system and is implemented 
as a set of foreign functions implemented in Java. Those 
functions are called from AMOS II to load DTD schema 
and XML data into the database. The read data can be 
queried by the existing query capabilities of the AMOS II 
system. 
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Figure 1 An example of distributed AMOS II Systems 
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Figure 2 AmosXML Wrapper 

In the AmosXML wrapper, IBM's XML Parser for 
Java [19) is used to parse XML documents into the 
Document Object Model (DOM) data structure [4J. The 
DOM is a platform and language neutral interface that 
allows programs and scripts to dynamically access and 
update the contents, structures and styles of documents. 
The DOM provides a standard set of objects for repre- 
senting HTML and XML documents, a standard model of 
how these objects can be combined, and a standard 
interface for accessing and manipulating them. It closely 
resembles the structure of the documents it models. Using 
DOM as input, the "DOM-AMOS II Back End" commu- 
nicates with AMOS II to call AMOSQL statements 
dynamically. The user finally submits AMOSQL queries 
to the AMOS II server to access the XML data. 

4 Translation rules 

One challenging issue is to define a way to map the 
data model of an XML document into the data model of 
AMOS II. The DTDs produce schema information and 
the XML documents themselves produce data but also 
schema information when DTDs are omitted or incom- 
plete. Therefore we design the schema definition rules to 
be incremental; i.e. the schema is dynamically extended 
while the XML data is read and may be modified to 
satisfy new data constraints discovered during this. 

There are two kinds of transformation rules: 

• Rules applied on DTDs; 

• Rules applied while reading XML data. 

In the translation rules below we will show how differ- 
ent kinds of DTD definitions and elements in XML 
documents dynamically extend the schema and content of 
the mediator database. We use the syntax of AMOSQL to 
illustrate this. 



4.1 DTD rules 

The first rule creates a type (class) for an element defi- 
nition: 

Rule /. A new type is created for every element 
definition declared in the DTD to have at least 
one sub-element or attribute or where a 
subelement is declared as "ANY" or 
"EMPTY". 

For example, assume we have the following DTD 
statement: 

<! ELEMENT person (employee )> 

Using the above rule, the following statement instructs 
AMOS II to dynamically extend the schema with a new 
type named person being a subtype of the system type 
xml: 

create type person under xml; 

The AmosXML wrapper dynamically executes the 
statement using the Java interfaces. Since the DTD 
specifies that elements tagged person must always contain 
one sub-element tagged employee, a new database type 
named person is created. Objects of that type are created 
when reading elements from the XML file, according to 
the rules in the next subsection. The system type "jrm/" is 
a supcrtype of all XML types. It has a function data to 
store the values of elements when "#PCDATA " is speci- 
fied. 

Rule 2 creates a function, rather than a type, for each 
leaf element defined in the DTD. By creating such prop- 
erty functions the queries to the database become simpler 
containing fewer levels of indirection, as shown below. 

Rule 2. If an element definition E does not have any 
subelement definitions (i.e. is a leaf element), 
then tag E is represented as a stored function 
(i.e. attribute), also named E, on the type 
(class) representing its parent element 
definition F. The function represents the value 
of an element as a string, i.e. its signature is 
E(F)->charstring. This function is called a 
property function. 

Consider the following part of the DTD person.dtd 
above: 

<f ELEMENT employee (family, given)> 
<! ELEMENT family (#PCDATA)> 
<! ELEMENT given (#PCDATA)> 

After applying Rule 2 in our example, the following 
AMOSQL statements dynamically defines two new 
functions in the mediator: 

create function family(employee) ->charstring as stored; 
create function given( employee) ->charstring as stored; 
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The clause "as stored' specifies that the functions rep- 
resent attributes (i.e. contain explicitly stored values). 

In this case the element definitions family and given do 
not contain any subelements, so two property functions 
are created reluming the type charstring, ralher than two 
new types. An example of a query to this database is: 

select family(e) from employee e where given (e)= "Hut"; 

If we would not have Rule 2, family and given would 
have been represented as types (classes) with the values 
of the elements stored in the function data. We would 
then have the following schema created instead: 

create type family under xml; 
create type given under xml; 

The above query would then have looked like this: 

select data (f) from family f given g, employee e 
where data (give n(e)) = "Hui"; 

This is clearly a more complex and less natural query. 
By representing leaf elements as functions most calls to 
the data function are avoided and fewer types (classes) 
are needed. With Rule 2 the data function needs only be 
used for accessing elements having both subelements and 
PC DATA specified. 

For elements definitions represented in the mediator as 
types having subelements also represented as types, Rule 
1 generates a new type. The following rule then generates 
a containment function to represent the element- 
subelement relationship between the new type and the 
elements it contains: 

Rule 3. A containment function is generated for an 
element definition represented in the mediator 
as a type (class) F that has a subelement E 
also represented as a type (class). It returns 
the collection of subelement objects contained 
in a given object. Its signature is E(F)->bag 
ofE. 

For example, consider the following DTD: 

< ! ELEMENT person ( employee ) > 

<! ELEMENT employee (family, given)> 

It generates the following containment function defini- 
tion: 

create function employ ee( person) ->bag of employee 
as stored; 

The containment function employee(person)->bag of 
employee returns a bag (set with duplicates) of employees 
since the element definition employee has one or several 
subelements. 

Attribute definitions generate functions prefixed with 
"attribute J' to distinguish them from property functions: 



Rule 4. An attribute function is created for each XML 
attribute defined in a DTD (using ATTUST) to 
represent the attribute of each element. 

For example, 

<! ATTUST person id ID #REQUIRED> 

is translated into the following function definition: 

create function attribute _ id( person) -> charstring 
as stored; 

A function attribute _id(per son) represents values of 
the attribute id for objects of type person. 

4.2 XML data rules 

The XML data rules dynamically add data to the me- 
diator database while reading XML documents. They may 
also modify the schema in case the DTD is incomplete or 
omitted. 

Rule 5 concerns creating objects for elements repre- 
sented as types: 

Rule 5. When reading an element, a test is made to 
check if its tag is previously represented as a 
type. If so, a new object O of that type is 
created. If the element has a value the data 
function data(O) is set to the contents of the 
element. If O is not the root element it is also 
added to the containment function of its 
parent element. 

For example, if the tags person and employee are pre- 
viously represented as types, the following XML docu- 
ment 

<!DOCTYPE person SYSTEM person.dtd> 
<person> 

<employee> 
Lin 

</employee> 
<Jperson> 

generates calls to the following AMOSQL statements: 

create person instances :p; 
create employee instances :e; 
set data(:e) = "Lin "; 
add employee (:p) = :e; 

In this example, Rule 5 first generates a new object of 
type person using the statement create. It has no value 
and is the root element of the document. Then the same 
rule creates an object of type employee and the data 
function of the new object is set to the contents of the 
element using the set statement. The new object is also 
added to the extent of the parent's containment function 
employee(person)->bag of employees using the add 
statement. 
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Rule 6 complements Rule 5 when the tag was previ- 
ously represented as a property function rather than as a 
type: 

Rule 6. If the tag of an element is previously 
represented as a property function on its 
parent (rather than as a type), it is added to 
the extent of that property function. 

For example, if the tag person is previously repre- 
sented as a type, and tag email is not previously repre- 
sented as a type, but as a properly function 
email(person)->char string, the following XML document 

<!DOCTYPE person SYSTEM " person.dtd" > 
<person> 

<email> 

Hui.Lin@dis.uu.se 

</einail> 
</person> 

generates calls to the following AMOSQL statements: 

create person instances :p; 

add employee(:p) = "HuL Lin @dis. uu.se"; 

In this example. Rule 5 generates a new object of type 
person using the statement create. It has no value and is 
the root element of the document. Then we populate the 
containment function: 

employ ee{ person )->charstring. 

The next rules apply when there is no DTD or when 
the DTD is incomplete ("ANY" specified). In such cases 
the schema is either extended or modified depending on 
how the lag is represented so far. 

The following definition is used below: 

Definition: A sub-element E of F is an unspecified 
subelement of F if either no DTD definition exists for E or 
F is specified as "ANY" in the DTD. 

Rule 7 dynamically creates a new containment func- 
tion the first time a containment relationship is discovered 
while reading an XML document: 

Rule 7. If !) element tagged E is an unspecified 
subelement of an element tagged F; 2) E is 
represented as a type; 3) there is no previously 
defined containment function E->F; then we 
dynamically create a new containment 
function E->F. 

Assume now the following XML document without 
DTD: 

<person> 

<employee> 

<family> Lin <Jfamily> 

<Jemployee> 
<Jperson> 



In this case Rule 7 applies since 

• employee is an unspecified subelement of person (it is 
the first time that employee shows up) , 

• tag employee is represented as a type (Rule 1), 

• there is not previous containment function 
employ ee( person) -> bag of employee. 

Following Rule 7, the following containment function 
is dynamically generated when the XML document is 
read: 

create function employee(person) ->employee as stored; 

The containment function is created in order to estab- 
lish the discovered containment relationship between 
types person and employee. 

The next rule complements Rule 7 when E is not a 
type: 

Rule 8. If I) element tagged E is an unspecified 
subelement of an element tagged F; 2) E is not 
previously encountered; 3) the element tagged 
E has no subelement s; then we dynamically 
create a property function E(F)->charstring. 

Assume the following XML document not having any 
DTD: 

<person> 

<email> Hui.Lin@dis.uu.se <Jemail> 
<Jperson> 

In this case Rule 8 applies since the element email is 
an unspecified subelement of the element person, tag 
person is not previously defined as a type, and the ele- 
ment has no subelement. The following function defini- 
tion is dynamically created: 

create function email (person) ->charstring as stored; 

Notice that this rule will apply only once for each kind 
of element. 

Rule 9 dynamically converts a previously defined 
property function to a type when the element is discov- 
ered to have subelements: 

Rule 9. If I) an element tagged E is an unspecified 
subelement of an element tagged F; 2) E is 
. previously represented as a property function 
E(F)->charstring; 3) the element tagged E 
has its own subelements or attributes; then we 
dynamically create a new type representing E 
and a containment function with signature 
E(F)->bag of E. The property function E(F)- 
>charstring is converted into the containment 
function and the containment function is 
updated accordingly. 

For example, assume the following XML document 
without associated DTD: 
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<person> 

<name>Hui Lin <Jname> 
<Jpefson> 
<person> 

<namexfamily> Lin <Jfamily><Jname> 
</person> 

According to Rule 8, the following statements are gen- 
erated when parsing the first element person: 

create type person under xml; 

create function name (person) -> char string as stored; 
create person instances :p; 
set name (:p) = "Hui Lin "; 

A new type is created when reading the first occur- 
rence of element person since it contains subelement 
name. Since element name has no subelement a property 
function name(person)->charstring is created and popu- 
lated. 

However, when later reading the second occurrence of 
element person, an element family is discovered as a 
subelement of the element name. Thus the property 
function name must be migrated to a type. According to 
Rule 9, the following statements are generated: 

create type name under xml; 
create name instances :m; 
set data(:m) = name(:p); 

create function name (person) -> name as stored; 

set name(:p) = :m; 

create person instances :pn; 

create name instances :n; 

set name(:pn) = ;n; 

In this case, a new type is created for the element name 
since it contains a subelement family. The property 
function name(person)->charstring is converted into a 
containment function name(person)->name, and popu- 
lated with :m (from the old property function) and :n (the 
new object). 

Furthermore, we also need to generate a property func- 
tion fami\y(name)-> char string according to Rule 8, since 
the element family has no subelement. 

create function family(name)->charstring as stored; 
setfamily(:n) = "Lin"; 

Finally we need a special overriding rule to guarantee 
that there will always be an object representing root 
elements and elements having subelements or attributes: 

Rule 10. Tags of elements that are found to J) be root 
element; 2) or luive sub-elements; 3) or have 
attributes; are always represented as types. 

For example, in the uncommon event that the root 
clement of an XML document does not have any subele- 
ments it will be represented as an object. We will not 
elaborate this case further here. 



5 Conclusion and future work 

We described the architecture of a wrapper called 
AmosXML that allows parsing and querying XML 
documents from an object-oriented mediator system. 
Furthermore, incremental translation rules were described 
that infer OO schema elements while reading DTD 
definitions or XML documents. Some rules infer the OO 
schema from the DTD, when available. For XML docu- 
ments without DTDs, or when the DTD is incomplete, 
other rules incrementally infer the OO schema from the 
contents of the accessed XML documents. The discovery 
of OO schema structures combined with other OO media- 
tion facilities in AMOS II [6][10][l 1] allow the specifica- 
tion OO queries and views over data from XML docu- 
ments combined with data from other data sources. The 
incremental nature of the translation rules allow them to 
be applied in a streamed fashion, which is important for 
large data files and when the network communication is 
slow or bursty. 

There are several possible directions for our future 
work: 

• The current rules do not infer any inheritance, but a 
flat type hierarchy is generated. Rules should be added 
to infer inheritance hierarchies, e.g. by using 
behavioral definitions of types [13] where a type is 
defined by its behavior (i.e. its attributes and methods). 
In our case this means that a type T is defined as a 
subtype of another type U if the set of functions on U 
is a subset of the set of functions on T. 

• Integrating XML data involves efficient processing of 
queries over many relatively small XML data files 
described by several layers of meta-data descriptions 
and links. For example, there can be 'master' XML 
documents having links to other XML documents and 
DTDs. Therefore the query language needs to be able 
to transparently express queries referencing both XML 
data and the meta-data in the master documents. New 
techniques are needed to be able to specify and 
efficiently process queries over such multi-layered 
meta-data. 

• The conventional exhaustive cost-based query 
processing techniques do not scale over large numbers 
of distributed XML documents. New distributed 
heuristic techniques need to be developed for this. 
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